Querying Hebrew Texts via Word Spotting
نویسندگان
چکیده
We report on recent results with word-spotting (WS) in Hebrew historical texts, manuscript and printed. The advantage of such a retrieval system is that it works on images without any need for manual or computer transcription of the texts. The method allows for extremely rapid querying, while still maintaining high accuracy; thus, it should be considered as an important tool in historical textual research.
منابع مشابه
Lexical Affect Sensing: Word Spotting Revisited
Recently, there has been considerable interest in the recognition of affect from texts. In this paper, we revisit the word spotting technique for affect sensing in short texts, for which purpose, we extract words from different affect dictionaries and explore the performance of various strategies for sensing affect.
متن کاملAutomatic Transliteration of Judeo-Arabic Texts into Arabic Script
! The Judeo-Arabic languages comprise a set of dialects spoken and written by Jewish communities living in Arab countries, mainly during the middle ages. Judeo-Arabic is typically written in Hebrew letters, enriched with various diacritic marks. The Judeo-Arabic spoken and written by any particular Jewish community is similar to the Arabic dialect used by their local Muslim community. In additi...
متن کاملA survey of document image word spotting techniques
Vast collections of documents available in image format need to be indexed for information retrieval purposes. In this framework, word spotting is an alternative solution to optical character recognition (OCR), which is rather inefficient for recognizing text of degraded quality and unknown fonts usually appearing in printed text, or writing style variations in handwritten documents. Over the p...
متن کاملIdentifying translationese at the word and sub-word level
We use text classification to distinguish automatically between original and translated texts in Hebrew, a morphologically complex language. To this end, we design several linguistically informed feature sets that capture word-level and sub-word-level (in particular, morphological) properties of Hebrew. Such features are abstract enough to allow for the development of accurate, robust classifie...
متن کاملA Morphological, Syntactic, and Semantic Search Engine for Hebrew Texts
This article describes the construction of a morphological, syntactic and semantic analyzer to operate a high-grade search engine for Hebrew texts. A good search engine must be complete and accurate. In Hebrew or Arabic script most of the vowels are not written, many particles are attached to the word without space, a double consonant is written with one letter, and some letters signify both vo...
متن کامل